Automatic Keyword Extraction for News Finder

نویسندگان

  • José Luis Martínez-Fernández
  • Ana M. García-Serrano
  • Paloma Martínez
  • Julio Villena-Román
چکیده

Newspapers are one of the most challenging domains for information retrieval systems: new articles appear everyday written in different languages, with multimedia contents and the news repositories may be updated in a matter of hours so information extraction is crucial to the metadata contents of the news. Further approaches of “smart retrieval” have to cope with multimedia and multilingual features as well as have to obtain really good precision features in order to reach a high degree of user satisfaction with the retrieved documents. The paper focus is the description of the automatic keyword extraction (AKE) process for news characterization that uses several linguistic techniques to improve the current state of the text-based information retrieval1. The first prototype implemented focusing in the AKE process (www.omnipaper.org) is described and some relevant performance features are included. Finally, some conclusions and comments are given regarding the role of the linguistic engineering in the web era.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexicon Analysis Based Automatic News Classification Approach – A Review

The news classification approach is the primary approach for the online news portals with the news data sourced from the various portals. The various types of data is received and accepted over the news classification portals. The lexicon analysis plays the key role in the categorization of the news automatically using the automatic news category recognition by analyzing the keyword data extrac...

متن کامل

Automatic Keyword Extraction for Text Summarization: A Survey

In recent times, data is growing rapidly in every domain such as news, social media, banking, education, etc. Due to the excessiveness of data, there is a need of automatic summarizer which will be capable to summarize the data especially textual data in original document without losing any critical purposes. Text summarization is emerged as an important research area in recent past. In this re...

متن کامل

Automatic news summarization method based Maximal Marginal Relevance

Automatic summary, as the name suggests, is from a single article or multiple articles, the removal point to summarize the article to the effect of technology. It plays an important role in machine learning and data mining in order to obtain higher quality news summaries, news summaries proposed extraction method based on the theme of integration of mmr and gsvm. In news text recorded as a proc...

متن کامل

Automatic Keyword Extraction on Twitter

In this paper, we build a corpus of tweets from Twitter annotated with keywords using crowdsourcing methods. We identify key differences between this domain and the work performed on other domains, such as news, which makes existing approaches for automatic keyword extraction not generalize well on Twitter datasets. These datasets include the small amount of content in each tweet, the frequent ...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003